
ISM6564 Fall 2023
© 2023 Murali
Use screen scraping to collect the speeches of all US Presidents, available from the Miller Center's website (https://millercenter.org/the-presidency/presidential-speechesLinks to an external site.). Add to the corpus the year of each speech, each president’s party affiliation (Democrat, Republican, or Other), and the start and end dates of the president's term (find a website with this information, and use screen scraping to extract the relevant information). Place all fields (including the speech content, party affiliation, and start and end dates) in a CSV file.
Analyze the content of your csv file to answer the following questions.
Which president has the most vocabulary, as evident from their inaugural speeches, and which president has the least vocabulary? On average, do Democratic, Republican, or Other presidents have a higher vocabulary? (2 points)
Create a barplot of presidential vocabulary from the earliest president (Washington) to the latest (Biden) in chronological order. Color code this barplot as blue for Democrat, red for Republican, and gray for Others. (1 point)
What are the five most frequently used words (exclusive of stop words) used by each president? What are the five most frequently words used collectively by all Democratic presidents versus Republican presidents? (2 point)
What are the key themes (e.g., freedom, liberty, country, etc.) used by each president in their inaugural speech? (3 points)
Compute a sentiment (positive/negative) for each presidential speech, and draw a barplot of the sentiment of all presidential speeches in chronological order. Again, color code the speeches as blue for Democrat, red for Republican, and gray for Other. Which of these groups have higher mean sentiment score? Who are the top three presidents with the highest positive sentiment in each group? (2 points)
NOTE1: To receive any marks, you must submit the working code that scrapes and assembles the csv data and analyzes the data to answer the questions above, and the resulting csv file. You must also submit an html export of your notebook that shows that you have successfully run this notebook.
NOTE2: Points will be deducted for copy-and-paste code from the class examples without thinking about their appropriateness for the assignment. Your code must be compact, free of errors, without unnecessary details not asked in the question, using functions and loops as appropriate, and using some comment statements. You will lose points if you fail to adhere to these common coding expectations.
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.firefox import GeckoDriverManager
from bs4 import BeautifulSoup as bs
import re
import dateparser
import numpy as np
from matplotlib import pyplot as plt
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from nltk.util import ngrams
from wordcloud import WordCloud
from PIL import Image # used for opening image for masking wordcloud # you need to install Pillow package
import nltk
nltk.download('punkt') # sentance tokenizer
nltk.download('stopwords')
nltk.download('wordnet') # WordNet is a lexical database for the English language - used to find the lemma of a word
nltk.download('vader_lexicon') # Valence Aware Dictionary and sEntiment Reasoner
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from collections import Counter
[nltk_data] Downloading package punkt to [nltk_data] C:\Users\rmura\AppData\Roaming\nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package stopwords to [nltk_data] C:\Users\rmura\AppData\Roaming\nltk_data... [nltk_data] Package stopwords is already up-to-date! [nltk_data] Downloading package wordnet to [nltk_data] C:\Users\rmura\AppData\Roaming\nltk_data... [nltk_data] Package wordnet is already up-to-date! [nltk_data] Downloading package vader_lexicon to [nltk_data] C:\Users\rmura\AppData\Roaming\nltk_data... [nltk_data] Package vader_lexicon is already up-to-date!
# Start a driver session....
# if you hve selenium 4 installed, use one of these:
driver = webdriver.Firefox(service=Service(GeckoDriverManager().install())) # this will work on Windows and Mac, and should work on Linux when run the first time
#driver = webdriver.Firefox() # use if geckodriver is in your PATH environmnet variable (which includes the same folder as your notebook)
# load page with Selenium
driver.get("https://millercenter.org/the-presidency/presidential-speeches")
driver.implicitly_wait(10) # implicitly_wait method sets a sticky timeout to implicitly wait for an element to be found, or a command to complete. This method only
# needs to be called one time per session.
pause_scroll = 3 # we need to pause after each time we scroll down
previous_page_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(pause_scroll)
new_page_height = driver.execute_script("return document.body.scrollHeight")
if new_page_height == previous_page_height:
break
previous_page_height = new_page_height
page_source = driver.page_source
driver.close()
! pip install lxml
Requirement already satisfied: lxml in c:\users\rmura\anaconda3\envs\text_analytics\lib\site-packages (4.9.2)
#retrieve urls to all speeches
bsobject_linkpage = bs(page_source,'lxml')
bs_links = bsobject_linkpage.find_all("a", href = re.compile('presidential-speeches/'))
bs_links[-1] # display the first 5
<a href="https://millercenter.org/the-presidency/presidential-speeches/april-30-1789-first-inaugural-address" target="_blank">April 30, 1789: First Inaugural Address</a>
speech_link_list = []
for link in bs_links:
speech_link_list.append(link['href'])
speech_link_list[:5] # display first 5
['https://millercenter.org/the-presidency/presidential-speeches/february-21-2023-remarks-one-year-anniversary-ukraine-war', 'https://millercenter.org/the-presidency/presidential-speeches/february-7-2023-state-union-address', 'https://millercenter.org/the-presidency/presidential-speeches/september-21-2022-speech-77th-session-united-nations-general', 'https://millercenter.org/the-presidency/presidential-speeches/september-1-2022-remarks-continued-battle-soul-nation', 'https://millercenter.org/the-presidency/presidential-speeches/may-24-2022-remarks-school-shooting-uvalde-texas']
%%time
# looking at html content...
# there is a class called president-name, episode-date, speed-loc, about-sidebar--intro,
# presidential-speeches--title, presidential-speeches--title, view-transcript
#
# view-transcript content may have multiple "Transcript" Headers (header 3)
# it will also include a title ending in a colon
#
#scrape the speech#
driver = webdriver.Firefox(service=Service(GeckoDriverManager().install())) # start a new session
pause_between_pages = 2
# create empty lists to store data from each page
title, speech, name, date, about = ([] for i in range(5))
for link in speech_link_list:
#access speech page with Selenium and find div class "transcript-inner"
driver.get(link)
# use beautiful soup to parse the html
bsobject_speechpage = bs(driver.page_source, 'lxml')
#scrape speech test, tital, presidents name, date of speech and text about the speech.
try:
title.append(bsobject_speechpage.find('h2', class_="presidential-speeches--title").text.strip())
except:
title.append("No title available")
try:
speech_raw = bsobject_speechpage.find('div', class_="transcript-inner").text.strip().replace('\xa0', '')
speech.append(re.sub(r"Transcript|\n","",speech_raw))
except:
try: # older links use the class view-transcript instead of transcript-inner; if transcript-inner doesn't work, thy view-transcript
speech_raw = bsobject_speechpage.find('div', class_="view-transcript").text.strip().replace('\xa0', '')
speech.append(re.sub(r"Transcript|\n"," ",speech_raw))
except:
speech.append("No speech available")
try:
name.append(bsobject_speechpage.find('p', class_="president-name").text.strip())
except:
name.append("No name available")
try:
date.append(dateparser.parse(bsobject_speechpage.find('p', class_="episode-date").text.strip()))
except:
date.append("No date available")
try:
about.append(bsobject_speechpage.find('div', class_="about-sidebar--intro").text.strip())
except:
about.append("No info available")
# pause before getting next page
time.sleep(pause_between_pages)
driver.close()
CPU times: total: 19.8 s Wall time: 59min 9s
#save this to a dataframe and save to a csv file#
if len(title) == len(speech) == len(name) == len(date) == len(about):
speeches_presidents = pd.DataFrame({'name':name,'title':title,'date':date,'about':about,'speech':speech})
speeches_presidents['speech'] = speeches_presidents['speech'].apply(lambda x: x.replace(".",". "))
speeches_presidents.to_csv("./data/presidential_speeches.csv",index=False)
else:
print("Something went wrong with scraping the speeches. Please check the code.")
# dump the data to csv files for debugging
df_names = pd.DataFrame({'name':name})
df_names.to_csv("./data/names.csv",index=False)
df_titles = pd.DataFrame({'title':title})
df_titles.to_csv("./data/titles.csv",index=False)
df_dates = pd.DataFrame({'date':date})
df_dates.to_csv("./data/dates.csv",index=False)
df_infos = pd.DataFrame({'about':about})
df_infos.to_csv("./data/about.csv",index=False)
df_speeches = pd.DataFrame({'speech':speech})
df_speeches.to_csv("./data/speeches.csv",index=False)
speeches_presidents.head()
| name | title | date | about | speech | |
|---|---|---|---|---|---|
| 0 | Joe Biden | February 21, 2023: Remarks on the One-Year Ann... | 2023-02-21 | Speaking at the Royal Castle in Warsaw, Poland... | THE PRESIDENT: Hello, Poland! One of our great... |
| 1 | Joe Biden | February 7, 2023: State of the Union Address | 2023-02-07 | In his State of the Union Address, President J... | Mr. Speaker. Madam Vice President. Our Firs... |
| 2 | Joe Biden | September 21, 2022: Speech before the 77th Ses... | 2022-09-21 | President Joe Biden addresses the 77th session... | Thank you. Mr. President, Mr. Secretary-Gene... |
| 3 | Joe Biden | September 1, 2022: Remarks on the Continued Ba... | 2022-09-01 | President Joe Biden speaks in Philadelphia, Pe... | THE PRESIDENT: My fellow Americans, please, if... |
| 4 | Joe Biden | May 24, 2022: Remarks on School Shooting in Uv... | 2022-05-24 | President Biden makes an impassioned plea to s... | Good evening, fellow Americans. I had hoped, w... |
# here we scrape information on president's term and party
#
# NOTE: Britiania seems to be attempting to block web scrapers. When this happened, a regular requests
# approach will fail. To bypass this, you will need to use selenium.
# The following code should work if site is blocking scraper:
# Start a driver session....
# if you hve selenium 4 installed, use one of these:
driver = webdriver.Firefox(service=Service(GeckoDriverManager().install())) # this will work on Windows and Mac, and should work on Linux when run the first time
#driver = webdriver.Firefox() # use if geckodriver is in your PATH environmnet variable (which includes the same folder as your notebook)
driver.get("https://www.britannica.com/topic/Presidents-of-the-United-States-1846696")
driver.implicitly_wait(10)
page_source = driver.page_source
driver.close()
# pandas read html will parse the contents of the table in the downloaded webpage
presidents = pd.read_html(page_source)[0]
presidents
| Unnamed: 0 | no. | president | birthplace | political party | term | |
|---|---|---|---|---|---|---|
| 0 | NaN | 1 | George Washington | Va. | Federalist | 1789–97 |
| 1 | NaN | 2 | John Adams | Mass. | Federalist | 1797–1801 |
| 2 | NaN | 3 | Thomas Jefferson | Va. | Democratic-Republican | 1801–09 |
| 3 | NaN | 4 | James Madison | Va. | Democratic-Republican | 1809–17 |
| 4 | NaN | 5 | James Monroe | Va. | Democratic-Republican | 1817–25 |
| 5 | NaN | 6 | John Quincy Adams | Mass. | National Republican | 1825–29 |
| 6 | NaN | 7 | Andrew Jackson | S.C. | Democratic | 1829–37 |
| 7 | NaN | 8 | Martin Van Buren | N.Y. | Democratic | 1837–41 |
| 8 | NaN | 9 | William Henry Harrison | Va. | Whig | 1841* |
| 9 | NaN | 10 | John Tyler | Va. | Whig | 1841–45 |
| 10 | NaN | 11 | James K. Polk | N.C. | Democratic | 1845–49 |
| 11 | NaN | 12 | Zachary Taylor | Va. | Whig | 1849–50* |
| 12 | NaN | 13 | Millard Fillmore | N.Y. | Whig | 1850–53 |
| 13 | NaN | 14 | Franklin Pierce | N.H. | Democratic | 1853–57 |
| 14 | NaN | 15 | James Buchanan | Pa. | Democratic | 1857–61 |
| 15 | NaN | 16 | Abraham Lincoln | Ky. | Republican | 1861–65* |
| 16 | NaN | 17 | Andrew Johnson | N.C. | Democratic (Union) | 1865–69 |
| 17 | NaN | 18 | Ulysses S. Grant | Ohio | Republican | 1869–77 |
| 18 | NaN | 19 | Rutherford B. Hayes | Ohio | Republican | 1877–81 |
| 19 | NaN | 20 | James A. Garfield | Ohio | Republican | 1881* |
| 20 | NaN | 21 | Chester A. Arthur | Vt. | Republican | 1881–85 |
| 21 | NaN | 22 | Grover Cleveland | N.J. | Democratic | 1885–89 |
| 22 | NaN | 23 | Benjamin Harrison | Ohio | Republican | 1889–93 |
| 23 | NaN | 24 | Grover Cleveland | N.J. | Democratic | 1893–97 |
| 24 | NaN | 25 | William McKinley | Ohio | Republican | 1897–1901* |
| 25 | NaN | 26 | Theodore Roosevelt | N.Y. | Republican | 1901–09 |
| 26 | NaN | 27 | William Howard Taft | Ohio | Republican | 1909–13 |
| 27 | NaN | 28 | Woodrow Wilson | Va. | Democratic | 1913–21 |
| 28 | NaN | 29 | Warren G. Harding | Ohio | Republican | 1921–23* |
| 29 | NaN | 30 | Calvin Coolidge | Vt. | Republican | 1923–29 |
| 30 | NaN | 31 | Herbert Hoover | Iowa | Republican | 1929–33 |
| 31 | NaN | 32 | Franklin D. Roosevelt | N.Y. | Democratic | 1933–45* |
| 32 | NaN | 33 | Harry S. Truman | Mo. | Democratic | 1945–53 |
| 33 | NaN | 34 | Dwight D. Eisenhower | Texas | Republican | 1953–61 |
| 34 | NaN | 35 | John F. Kennedy | Mass. | Democratic | 1961–63* |
| 35 | NaN | 36 | Lyndon B. Johnson | Texas | Democratic | 1963–69 |
| 36 | NaN | 37 | Richard M. Nixon | Calif. | Republican | 1969–74** |
| 37 | NaN | 38 | Gerald R. Ford | Neb. | Republican | 1974–77 |
| 38 | NaN | 39 | Jimmy Carter | Ga. | Democratic | 1977–81 |
| 39 | NaN | 40 | Ronald Reagan | Ill. | Republican | 1981–89 |
| 40 | NaN | 41 | George Bush | Mass. | Republican | 1989–93 |
| 41 | NaN | 42 | Bill Clinton | Ark. | Democratic | 1993–2001 |
| 42 | NaN | 43 | George W. Bush | Conn. | Republican | 2001–09 |
| 43 | NaN | 44 | Barack Obama | Hawaii | Democratic | 2009–17 |
| 44 | NaN | 45 | Donald Trump | N.Y. | Republican | 2017–21 |
| 45 | NaN | 46 | Joe Biden | Pa. | Democratic | 2021– |
| 46 | *Died in office. | *Died in office. | *Died in office. | *Died in office. | *Died in office. | *Died in office. |
| 47 | **Resigned from office. | **Resigned from office. | **Resigned from office. | **Resigned from office. | **Resigned from office. | **Resigned from office. |
# note that the last two rows contains non-presidential information
# let's remove these last two rows...
presidents = presidents.drop([int(len(presidents)-1), int(len(presidents)-2)])
presidents
| Unnamed: 0 | no. | president | birthplace | political party | term | |
|---|---|---|---|---|---|---|
| 0 | NaN | 1 | George Washington | Va. | Federalist | 1789–97 |
| 1 | NaN | 2 | John Adams | Mass. | Federalist | 1797–1801 |
| 2 | NaN | 3 | Thomas Jefferson | Va. | Democratic-Republican | 1801–09 |
| 3 | NaN | 4 | James Madison | Va. | Democratic-Republican | 1809–17 |
| 4 | NaN | 5 | James Monroe | Va. | Democratic-Republican | 1817–25 |
| 5 | NaN | 6 | John Quincy Adams | Mass. | National Republican | 1825–29 |
| 6 | NaN | 7 | Andrew Jackson | S.C. | Democratic | 1829–37 |
| 7 | NaN | 8 | Martin Van Buren | N.Y. | Democratic | 1837–41 |
| 8 | NaN | 9 | William Henry Harrison | Va. | Whig | 1841* |
| 9 | NaN | 10 | John Tyler | Va. | Whig | 1841–45 |
| 10 | NaN | 11 | James K. Polk | N.C. | Democratic | 1845–49 |
| 11 | NaN | 12 | Zachary Taylor | Va. | Whig | 1849–50* |
| 12 | NaN | 13 | Millard Fillmore | N.Y. | Whig | 1850–53 |
| 13 | NaN | 14 | Franklin Pierce | N.H. | Democratic | 1853–57 |
| 14 | NaN | 15 | James Buchanan | Pa. | Democratic | 1857–61 |
| 15 | NaN | 16 | Abraham Lincoln | Ky. | Republican | 1861–65* |
| 16 | NaN | 17 | Andrew Johnson | N.C. | Democratic (Union) | 1865–69 |
| 17 | NaN | 18 | Ulysses S. Grant | Ohio | Republican | 1869–77 |
| 18 | NaN | 19 | Rutherford B. Hayes | Ohio | Republican | 1877–81 |
| 19 | NaN | 20 | James A. Garfield | Ohio | Republican | 1881* |
| 20 | NaN | 21 | Chester A. Arthur | Vt. | Republican | 1881–85 |
| 21 | NaN | 22 | Grover Cleveland | N.J. | Democratic | 1885–89 |
| 22 | NaN | 23 | Benjamin Harrison | Ohio | Republican | 1889–93 |
| 23 | NaN | 24 | Grover Cleveland | N.J. | Democratic | 1893–97 |
| 24 | NaN | 25 | William McKinley | Ohio | Republican | 1897–1901* |
| 25 | NaN | 26 | Theodore Roosevelt | N.Y. | Republican | 1901–09 |
| 26 | NaN | 27 | William Howard Taft | Ohio | Republican | 1909–13 |
| 27 | NaN | 28 | Woodrow Wilson | Va. | Democratic | 1913–21 |
| 28 | NaN | 29 | Warren G. Harding | Ohio | Republican | 1921–23* |
| 29 | NaN | 30 | Calvin Coolidge | Vt. | Republican | 1923–29 |
| 30 | NaN | 31 | Herbert Hoover | Iowa | Republican | 1929–33 |
| 31 | NaN | 32 | Franklin D. Roosevelt | N.Y. | Democratic | 1933–45* |
| 32 | NaN | 33 | Harry S. Truman | Mo. | Democratic | 1945–53 |
| 33 | NaN | 34 | Dwight D. Eisenhower | Texas | Republican | 1953–61 |
| 34 | NaN | 35 | John F. Kennedy | Mass. | Democratic | 1961–63* |
| 35 | NaN | 36 | Lyndon B. Johnson | Texas | Democratic | 1963–69 |
| 36 | NaN | 37 | Richard M. Nixon | Calif. | Republican | 1969–74** |
| 37 | NaN | 38 | Gerald R. Ford | Neb. | Republican | 1974–77 |
| 38 | NaN | 39 | Jimmy Carter | Ga. | Democratic | 1977–81 |
| 39 | NaN | 40 | Ronald Reagan | Ill. | Republican | 1981–89 |
| 40 | NaN | 41 | George Bush | Mass. | Republican | 1989–93 |
| 41 | NaN | 42 | Bill Clinton | Ark. | Democratic | 1993–2001 |
| 42 | NaN | 43 | George W. Bush | Conn. | Republican | 2001–09 |
| 43 | NaN | 44 | Barack Obama | Hawaii | Democratic | 2009–17 |
| 44 | NaN | 45 | Donald Trump | N.Y. | Republican | 2017–21 |
| 45 | NaN | 46 | Joe Biden | Pa. | Democratic | 2021– |
# first, split the string in the term column using dash as delimiter - store this in new column called 'from'
presidents['start_date'] = presidents['term'].apply(lambda x: dateparser.parse(x.split("–")[0]).year)
# calculate 'to' based on the content of the term string
def to_year(row):
row['term'] = re.sub(r"[^\d-]", "", row['term']) # replace any non-digit before dash with blank
term_list = row['term'].split("–") # split on dash (to get start and end year)
if len(term_list)== 1: # if we only have one date, then this is both from and to
return row['start_date']
elif len(term_list) == 2:
return row['start_date'][:2] + term_list[1] # return first two digits of from with string in to field
else:
return "bad data"
return row
presidents['end_date'] = presidents.apply(lambda row: to_year(row), axis=1)
presidents
| Unnamed: 0 | no. | president | birthplace | political party | term | start_date | end_date | |
|---|---|---|---|---|---|---|---|---|
| 0 | NaN | 1 | George Washington | Va. | Federalist | 1789–97 | 1789 | 1789 |
| 1 | NaN | 2 | John Adams | Mass. | Federalist | 1797–1801 | 1797 | 1797 |
| 2 | NaN | 3 | Thomas Jefferson | Va. | Democratic-Republican | 1801–09 | 1801 | 1801 |
| 3 | NaN | 4 | James Madison | Va. | Democratic-Republican | 1809–17 | 1809 | 1809 |
| 4 | NaN | 5 | James Monroe | Va. | Democratic-Republican | 1817–25 | 1817 | 1817 |
| 5 | NaN | 6 | John Quincy Adams | Mass. | National Republican | 1825–29 | 1825 | 1825 |
| 6 | NaN | 7 | Andrew Jackson | S.C. | Democratic | 1829–37 | 1829 | 1829 |
| 7 | NaN | 8 | Martin Van Buren | N.Y. | Democratic | 1837–41 | 1837 | 1837 |
| 8 | NaN | 9 | William Henry Harrison | Va. | Whig | 1841* | 1841 | 1841 |
| 9 | NaN | 10 | John Tyler | Va. | Whig | 1841–45 | 1841 | 1841 |
| 10 | NaN | 11 | James K. Polk | N.C. | Democratic | 1845–49 | 1845 | 1845 |
| 11 | NaN | 12 | Zachary Taylor | Va. | Whig | 1849–50* | 1849 | 1849 |
| 12 | NaN | 13 | Millard Fillmore | N.Y. | Whig | 1850–53 | 1850 | 1850 |
| 13 | NaN | 14 | Franklin Pierce | N.H. | Democratic | 1853–57 | 1853 | 1853 |
| 14 | NaN | 15 | James Buchanan | Pa. | Democratic | 1857–61 | 1857 | 1857 |
| 15 | NaN | 16 | Abraham Lincoln | Ky. | Republican | 1861–65* | 1861 | 1861 |
| 16 | NaN | 17 | Andrew Johnson | N.C. | Democratic (Union) | 1865–69 | 1865 | 1865 |
| 17 | NaN | 18 | Ulysses S. Grant | Ohio | Republican | 1869–77 | 1869 | 1869 |
| 18 | NaN | 19 | Rutherford B. Hayes | Ohio | Republican | 1877–81 | 1877 | 1877 |
| 19 | NaN | 20 | James A. Garfield | Ohio | Republican | 1881* | 1881 | 1881 |
| 20 | NaN | 21 | Chester A. Arthur | Vt. | Republican | 1881–85 | 1881 | 1881 |
| 21 | NaN | 22 | Grover Cleveland | N.J. | Democratic | 1885–89 | 1885 | 1885 |
| 22 | NaN | 23 | Benjamin Harrison | Ohio | Republican | 1889–93 | 1889 | 1889 |
| 23 | NaN | 24 | Grover Cleveland | N.J. | Democratic | 1893–97 | 1893 | 1893 |
| 24 | NaN | 25 | William McKinley | Ohio | Republican | 1897–1901* | 1897 | 1897 |
| 25 | NaN | 26 | Theodore Roosevelt | N.Y. | Republican | 1901–09 | 1901 | 1901 |
| 26 | NaN | 27 | William Howard Taft | Ohio | Republican | 1909–13 | 1909 | 1909 |
| 27 | NaN | 28 | Woodrow Wilson | Va. | Democratic | 1913–21 | 1913 | 1913 |
| 28 | NaN | 29 | Warren G. Harding | Ohio | Republican | 1921–23* | 1921 | 1921 |
| 29 | NaN | 30 | Calvin Coolidge | Vt. | Republican | 1923–29 | 1923 | 1923 |
| 30 | NaN | 31 | Herbert Hoover | Iowa | Republican | 1929–33 | 1929 | 1929 |
| 31 | NaN | 32 | Franklin D. Roosevelt | N.Y. | Democratic | 1933–45* | 1933 | 1933 |
| 32 | NaN | 33 | Harry S. Truman | Mo. | Democratic | 1945–53 | 1945 | 1945 |
| 33 | NaN | 34 | Dwight D. Eisenhower | Texas | Republican | 1953–61 | 1953 | 1953 |
| 34 | NaN | 35 | John F. Kennedy | Mass. | Democratic | 1961–63* | 1961 | 1961 |
| 35 | NaN | 36 | Lyndon B. Johnson | Texas | Democratic | 1963–69 | 1963 | 1963 |
| 36 | NaN | 37 | Richard M. Nixon | Calif. | Republican | 1969–74** | 1969 | 1969 |
| 37 | NaN | 38 | Gerald R. Ford | Neb. | Republican | 1974–77 | 1974 | 1974 |
| 38 | NaN | 39 | Jimmy Carter | Ga. | Democratic | 1977–81 | 1977 | 1977 |
| 39 | NaN | 40 | Ronald Reagan | Ill. | Republican | 1981–89 | 1981 | 1981 |
| 40 | NaN | 41 | George Bush | Mass. | Republican | 1989–93 | 1989 | 1989 |
| 41 | NaN | 42 | Bill Clinton | Ark. | Democratic | 1993–2001 | 1993 | 1993 |
| 42 | NaN | 43 | George W. Bush | Conn. | Republican | 2001–09 | 2001 | 2001 |
| 43 | NaN | 44 | Barack Obama | Hawaii | Democratic | 2009–17 | 2009 | 2009 |
| 44 | NaN | 45 | Donald Trump | N.Y. | Republican | 2017–21 | 2017 | 2017 |
| 45 | NaN | 46 | Joe Biden | Pa. | Democratic | 2021– | 2021 | 2021 |
#C:\Users\rmura\Murali\Text Analytics\week3\data
presidents.to_csv("./data/presidential_party_and_term.csv", index=False)
speeches = pd.read_csv("./data/presidential_speeches.csv")
parties = pd.read_csv("./data/presidential_party_and_term.csv")
# change the column name of the presidents dataframe to match the speeches dataframe
# In the presidents dataframe, the column name is 'president'. In the speeches dataframe, the column name is 'name'
parties = parties.rename(columns={'president':'name'})
speeches.head()
| name | title | date | about | speech | |
|---|---|---|---|---|---|
| 0 | Joe Biden | February 21, 2023: Remarks on the One-Year Ann... | 2023-02-21 | Speaking at the Royal Castle in Warsaw, Poland... | THE PRESIDENT: Hello, Poland! One of our great... |
| 1 | Joe Biden | February 7, 2023: State of the Union Address | 2023-02-07 | In his State of the Union Address, President J... | Mr. Speaker. Madam Vice President. Our Firs... |
| 2 | Joe Biden | September 21, 2022: Speech before the 77th Ses... | 2022-09-21 | President Joe Biden addresses the 77th session... | Thank you. Mr. President, Mr. Secretary-Gene... |
| 3 | Joe Biden | September 1, 2022: Remarks on the Continued Ba... | 2022-09-01 | President Joe Biden speaks in Philadelphia, Pe... | THE PRESIDENT: My fellow Americans, please, if... |
| 4 | Joe Biden | May 24, 2022: Remarks on School Shooting in Uv... | 2022-05-24 | President Biden makes an impassioned plea to s... | Good evening, fellow Americans. I had hoped, w... |
parties.head()
| Unnamed: 0 | no. | name | birthplace | political party | term | start_date | end_date | |
|---|---|---|---|---|---|---|---|---|
| 0 | NaN | 1 | George Washington | Va. | Federalist | 1789–97 | 1789 | 1789 |
| 1 | NaN | 2 | John Adams | Mass. | Federalist | 1797–1801 | 1797 | 1797 |
| 2 | NaN | 3 | Thomas Jefferson | Va. | Democratic-Republican | 1801–09 | 1801 | 1801 |
| 3 | NaN | 4 | James Madison | Va. | Democratic-Republican | 1809–17 | 1809 | 1809 |
| 4 | NaN | 5 | James Monroe | Va. | Democratic-Republican | 1817–25 | 1817 | 1817 |
import difflib # this will provide us with a 'fuzzy' match between the presidential names found in each table
#convert name in party to name it most closely matches in speeches
parties['name'] = parties['name'].apply(lambda x: difflib.get_close_matches(x, speeches['name'])[0])
# merge the DataFrames into one
merged = speeches.merge(parties)
# view final DataFrame
merged
| name | title | date | about | speech | Unnamed: 0 | no. | birthplace | political party | term | start_date | end_date | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Joe Biden | February 21, 2023: Remarks on the One-Year Ann... | 2023-02-21 | Speaking at the Royal Castle in Warsaw, Poland... | THE PRESIDENT: Hello, Poland! One of our great... | NaN | 46 | Pa. | Democratic | 2021– | 2021 | 2021 |
| 1 | Joe Biden | February 7, 2023: State of the Union Address | 2023-02-07 | In his State of the Union Address, President J... | Mr. Speaker. Madam Vice President. Our Firs... | NaN | 46 | Pa. | Democratic | 2021– | 2021 | 2021 |
| 2 | Joe Biden | September 21, 2022: Speech before the 77th Ses... | 2022-09-21 | President Joe Biden addresses the 77th session... | Thank you. Mr. President, Mr. Secretary-Gene... | NaN | 46 | Pa. | Democratic | 2021– | 2021 | 2021 |
| 3 | Joe Biden | September 1, 2022: Remarks on the Continued Ba... | 2022-09-01 | President Joe Biden speaks in Philadelphia, Pe... | THE PRESIDENT: My fellow Americans, please, if... | NaN | 46 | Pa. | Democratic | 2021– | 2021 | 2021 |
| 4 | Joe Biden | May 24, 2022: Remarks on School Shooting in Uv... | 2022-05-24 | President Biden makes an impassioned plea to s... | Good evening, fellow Americans. I had hoped, w... | NaN | 46 | Pa. | Democratic | 2021– | 2021 | 2021 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1086 | George Washington | December 29, 1790: Talk to the Chiefs and Coun... | 1790-12-29 | The President reassures the Seneca Nation that... | I the President of the United States, by my... | NaN | 1 | Va. | Federalist | 1789–97 | 1789 | 1789 |
| 1087 | George Washington | December 8, 1790: Second Annual Message to Con... | 1790-12-08 | Washington focuses on commerce in his second a... | Fellow citizens of the Senate and House of ... | NaN | 1 | Va. | Federalist | 1789–97 | 1789 | 1789 |
| 1088 | George Washington | January 8, 1790: First Annual Message to Congress | 1790-01-08 | In a wide-ranging speech, President Washington... | Fellow Citizens of the Senate and House of R... | NaN | 1 | Va. | Federalist | 1789–97 | 1789 | 1789 |
| 1089 | George Washington | October 3, 1789: Thanksgiving Proclamation | 1789-10-03 | At the request of Congress, Washington establi... | Whereas it is the duty of all Nations to ack... | NaN | 1 | Va. | Federalist | 1789–97 | 1789 | 1789 |
| 1090 | George Washington | April 30, 1789: First Inaugural Address | 1789-04-30 | President George Washington calls on Congress ... | Fellow Citizens of the Senate and the House ... | NaN | 1 | Va. | Federalist | 1789–97 | 1789 | 1789 |
1091 rows × 12 columns
merged.isna().sum()
name 0 title 0 date 0 about 1 speech 0 Unnamed: 0 1091 no. 0 birthplace 0 political party 0 term 0 start_date 0 end_date 0 dtype: int64
merged.columns
Index(['name', 'title', 'date', 'about', 'speech', 'Unnamed: 0', 'no.',
'birthplace', 'political party', 'term', 'start_date', 'end_date'],
dtype='object')
merged.drop('Unnamed: 0',axis=1,inplace=True)
merged.to_csv("./data/presidential_speeches_merged.csv",index=False)
merged['political party'].value_counts()
political party Democratic 497 Republican 439 Democratic-Republican 56 Democratic (Union) 31 Whig 30 Federalist 30 National Republican 8 Name: count, dtype: int64
speech_data_pre=pd.read_csv("./data/presidential_speeches_merged.csv")
speech_data_pre.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1091 entries, 0 to 1090 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 1091 non-null object 1 title 1091 non-null object 2 date 1091 non-null object 3 about 1090 non-null object 4 speech 1091 non-null object 5 no. 1091 non-null int64 6 birthplace 1091 non-null object 7 political party 1091 non-null object 8 term 1091 non-null object 9 start_date 1091 non-null int64 10 end_date 1091 non-null int64 dtypes: int64(3), object(8) memory usage: 93.9+ KB
# converting each raw speech of every us president
toke=[]
for x in speech_data_pre.speech:
# now use nltk natural langugae tool kit to text rto tokens
tokens=nltk.word_tokenize(x)
# remove all tokens that are not alphabetic
tokens = [word for word in tokens if word.isalpha()]
# make lowercase
tokens = [word.lower() for word in tokens]
# remove all tokens that are only one character
tokens = [word for word in tokens if len(word) > 1]
# remove stopwords
stop_words = stopwords.words('english')
tokens = [word for word in tokens if word not in stop_words]
# lemmatize words(Lemmatization is a text normalization technique - it is a process of converting words to their base forms)
# nltk.download('wordnet') # uncomment if you need to download the wordnet package
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(word) for word in tokens]
toke.append(tokens)
speech_data_pre['tokens']=toke
speech_data_pre.to_csv("./data/Final_presidential_speeches.csv")
speech_data_pre.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1091 entries, 0 to 1090 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 1091 non-null object 1 title 1091 non-null object 2 date 1091 non-null object 3 about 1090 non-null object 4 speech 1091 non-null object 5 no. 1091 non-null int64 6 birthplace 1091 non-null object 7 political party 1091 non-null object 8 term 1091 non-null object 9 start_date 1091 non-null int64 10 end_date 1091 non-null int64 11 tokens 1091 non-null object 12 wordCount 1091 non-null int64 dtypes: int64(4), object(9) memory usage: 110.9+ KB
speech_data_pre.head()
| name | title | date | about | speech | no. | birthplace | political party | term | start_date | end_date | tokens | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Joe Biden | February 21, 2023: Remarks on the One-Year Ann... | 2023-02-21 | Speaking at the Royal Castle in Warsaw, Poland... | THE PRESIDENT: Hello, Poland! One of our great... | 46 | Pa. | Democratic | 2021– | 2021 | 2021 | [president, hello, poland, one, great, ally, p... |
| 1 | Joe Biden | February 7, 2023: State of the Union Address | 2023-02-07 | In his State of the Union Address, President J... | Mr. Speaker. Madam Vice President. Our Firs... | 46 | Pa. | Democratic | 2021– | 2021 | 2021 | [speaker, madam, vice, president, first, lady,... |
| 2 | Joe Biden | September 21, 2022: Speech before the 77th Ses... | 2022-09-21 | President Joe Biden addresses the 77th session... | Thank you. Mr. President, Mr. Secretary-Gene... | 46 | Pa. | Democratic | 2021– | 2021 | 2021 | [thank, president, fellow, leader, last, year,... |
| 3 | Joe Biden | September 1, 2022: Remarks on the Continued Ba... | 2022-09-01 | President Joe Biden speaks in Philadelphia, Pe... | THE PRESIDENT: My fellow Americans, please, if... | 46 | Pa. | Democratic | 2021– | 2021 | 2021 | [president, fellow, american, please, seat, ta... |
| 4 | Joe Biden | May 24, 2022: Remarks on School Shooting in Uv... | 2022-05-24 | President Biden makes an impassioned plea to s... | Good evening, fellow Americans. I had hoped, w... | 46 | Pa. | Democratic | 2021– | 2021 | 2021 | [good, evening, fellow, american, hoped, becam... |
len(speech_data_pre.tokens[1])
wordCount_eachToken=list(map(lambda x:len(x),speech_data_pre.tokens))
speech_data_pre.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1091 entries, 0 to 1090 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 1091 non-null object 1 title 1091 non-null object 2 date 1091 non-null object 3 about 1090 non-null object 4 speech 1091 non-null object 5 no. 1091 non-null int64 6 birthplace 1091 non-null object 7 political party 1091 non-null object 8 term 1091 non-null object 9 start_date 1091 non-null int64 10 end_date 1091 non-null int64 11 tokens 1091 non-null object dtypes: int64(3), object(9) memory usage: 102.4+ KB
speech_data_pre['wordCount']=wordCount_eachToken
speech_data_pre['wordCount']
0 1347
1 3765
2 1879
3 1301
4 376
...
1086 631
1087 631
1088 393
1089 201
1090 645
Name: wordCount, Length: 1091, dtype: int64
speech_data_pre.count()
name 1091 title 1091 date 1091 about 1090 speech 1091 no. 1091 birthplace 1091 political party 1091 term 1091 start_date 1091 end_date 1091 tokens 1091 wordCount 1091 dtype: int64
speech_data_pre.groupby('name').wordCount.agg('sum')
144416
max_value=speech_data_pre[['name','wordCount']].iloc[speech_data_pre['wordCount'].idxmax()]
print("Max Speech president is ",max_value)
max_value=speech_data_pre[['name','wordCount']].iloc[speech_data_pre['wordCount'].idxmin()]
print("Min Speech President is ",max_value)
Max Speech president is name Abraham Lincoln wordCount 14614 Name: 875, dtype: object Min Speech President is name George Washington wordCount 58 Name: 1081, dtype: object
speech_data_pre.groupby('name').wordCount.agg('sum')
'DemocraticRepublicanDemocratic (Union)WhigNational RepublicanDemocratic-RepublicanFederalist'
speech_data_pre.groupby(['political party']).wordCount.agg('mean')
political party Democratic 1864.635815 Democratic (Union) 1474.064516 Democratic-Republican 1026.535714 Federalist 718.433333 National Republican 2075.750000 Republican 2069.102506 Whig 1984.900000 Name: wordCount, dtype: float64
col_party={'Democratic':"blue",'Republican':"red",'other':"gray"}
party_colors = speech_data_pre['political party'].map(col_party).fillna("gray")
# Create the barplot
plt.figure(figsize=(12, 6))
plt.barh(speech_data_pre['name'], speech_data_pre['wordCount'], color=party_colors)
plt.xlabel("Presidents")
plt.ylabel("Vocabulary")
plt.title("Presidential Vocabulary from Washington to Biden")
plt.xticks(rotation=90)
plt.tight_layout()
# Show the plot or save it to a file
plt.show()
#In this code, we first define the list of presidents, their party affiliations, and some random data representing their vocabulary. We then create a list of colors based on party affiliation and use Matplotlib to create the barplot, coloring the bars accordingly. Finally, we display the plot using plt.show(). You can customize the data and labels as needed for your specific analysis.
top_words_by_president = {}
for president, tokens in speech_data_pre.groupby("name")['tokens']:
# Combine all speeches for the president into a single text
all_speeches = [token for token_list in tokens for token in token_list]
word_freq = Counter(all_speeches)
top_words = word_freq.most_common(5)
# Store the results in the dictionary
top_words_by_president[president] = top_words
# Print the top five words for each president
for president, top_words in top_words_by_president.items():
print(f"Top 5 words for {president}:")
for word, freq in top_words:
print(f"{word}: {freq}")
print()
Top 5 words for Abraham Lincoln: state: 609 slavery: 411 would: 333 slave: 318 one: 302 Top 5 words for Andrew Jackson: state: 1332 government: 876 power: 596 may: 558 united: 556 Top 5 words for Andrew Johnson: state: 1341 united: 473 government: 411 law: 391 constitution: 382 Top 5 words for Barack Obama: applause: 1324 people: 928 american: 813 u: 691 year: 689 Top 5 words for Benjamin Harrison: state: 843 government: 667 upon: 643 year: 498 united: 479 Top 5 words for Bill Clinton: people: 1009 american: 775 year: 661 must: 558 america: 526 Top 5 words for Calvin Coolidge: government: 384 country: 255 made: 216 people: 210 would: 196 Top 5 words for Chester A. Arthur: state: 264 government: 231 year: 180 congress: 177 united: 171 Top 5 words for Donald Trump: people: 1210 president: 1195 going: 931 know: 843 want: 834 Top 5 words for Dwight D. Eisenhower: nation: 375 world: 311 must: 307 people: 267 year: 235 Top 5 words for Franklin D. Roosevelt: people: 573 war: 511 nation: 503 government: 471 american: 460 Top 5 words for Franklin Pierce: state: 660 government: 323 united: 320 power: 234 congress: 197 Top 5 words for George W. Bush: america: 1052 people: 1006 american: 882 nation: 794 world: 696 Top 5 words for George Washington: state: 212 united: 157 may: 135 government: 119 nation: 100 Top 5 words for Gerald Ford: american: 202 state: 184 people: 170 congress: 159 nation: 158 Top 5 words for Grover Cleveland: government: 1532 state: 1270 year: 1232 upon: 1096 united: 892 Top 5 words for Harry S. Truman: world: 215 people: 214 nation: 151 united: 138 would: 125 Top 5 words for Herbert Hoover: government: 469 upon: 342 state: 338 people: 308 year: 283 Top 5 words for James A. Garfield: government: 21 people: 20 constitution: 17 law: 15 upon: 13 Top 5 words for James Buchanan: state: 702 government: 445 would: 349 congress: 337 constitution: 296 Top 5 words for James K. Polk: state: 792 government: 573 mexico: 491 united: 462 war: 398 Top 5 words for James Madison: state: 244 united: 184 war: 147 government: 117 public: 117 Top 5 words for James Monroe: state: 355 government: 248 united: 213 great: 210 power: 165 Top 5 words for Jimmy Carter: president: 576 people: 502 would: 468 year: 420 country: 348 Top 5 words for Joe Biden: american: 459 people: 407 president: 369 year: 293 america: 290 Top 5 words for John Adams: state: 117 united: 90 nation: 63 government: 59 country: 55 Top 5 words for John F. Kennedy: world: 558 state: 530 nation: 520 would: 499 country: 483 Top 5 words for John Quincy Adams: state: 220 upon: 193 united: 147 year: 143 congress: 140 Top 5 words for John Tyler: state: 574 government: 384 united: 257 would: 237 may: 222 Top 5 words for Lyndon B. Johnson: president: 1378 people: 1001 would: 929 year: 872 think: 857 Top 5 words for Martin Van Buren: government: 402 state: 400 public: 273 upon: 223 bank: 199 Top 5 words for Millard Fillmore: state: 347 united: 192 government: 166 law: 166 congress: 140 Top 5 words for Richard M. Nixon: american: 314 year: 299 peace: 287 people: 270 would: 229 Top 5 words for Ronald Reagan: people: 915 u: 805 year: 773 government: 718 american: 686 Top 5 words for Rutherford B. Hayes: state: 507 government: 343 united: 327 congress: 265 law: 259 Top 5 words for Theodore Roosevelt: state: 845 government: 703 law: 613 united: 504 would: 480 Top 5 words for Thomas Jefferson: state: 167 may: 162 u: 143 shall: 128 nation: 102 Top 5 words for Ulysses S. Grant: state: 955 united: 620 government: 475 congress: 361 year: 315 Top 5 words for Warren G. Harding: world: 176 american: 144 government: 126 must: 100 republic: 100 Top 5 words for William Harrison: power: 63 government: 44 state: 40 constitution: 37 people: 37 Top 5 words for William McKinley: government: 614 state: 548 united: 402 congress: 277 upon: 258 Top 5 words for William Taft: government: 604 state: 546 country: 349 law: 346 united: 343 Top 5 words for Woodrow Wilson: upon: 386 government: 343 nation: 290 people: 279 must: 270 Top 5 words for Zachary Taylor: state: 100 congress: 60 government: 58 united: 43 treaty: 42
top_words_by_president = {}
specific_values = ['Democratic', 'Republican']
filtered_df = speech_data_pre[speech_data_pre['political party'].isin(specific_values)]
for president, tokens in filtered_df.groupby("political party")['tokens']:
# Combine all speeches for the president into a single text
all_speeches = [token for token_list in tokens for token in token_list]
word_freq = Counter(all_speeches)
top_words = word_freq.most_common(5)
# Store the results in the dictionary
top_words_by_president[president] = top_words
# Print the top five words for each president
for president, top_words in top_words_by_president.items():
print(f"Top 5 words for {president}:")
for word, freq in top_words:
print(f"{word}: {freq}")
print()
Top 5 words for Democratic: state: 8077 people: 6985 government: 6542 year: 5895 would: 5100 Top 5 words for Republican: state: 7787 government: 6732 people: 6191 year: 5337 american: 4737
for president, tokens in speech_data_pre.groupby("name")['tokens']:
# Combine all speeches for the president into a single text
all_speeches = [token for token_list in tokens for token in token_list]
# Create n-grams using NLTK
# n-grams is a way of preserving sequence (and help with meaning) of words
bigrams = list(ngrams(all_speeches, 2)) # create a list of bigrams (note that the output is a list of tuples)
# print the first 10 bigrams
# create a dictionary of bigrams and their counts
bigram_dict = {}
for bigram in bigrams: # iterate through the list of bigrams
bigram_str = ' '.join(bigram) # convert the bigram tuple to string
bigram_dict[bigram_str] = bigram_dict.get(bigram_str, 0) + 1 # add bigram to dictionary if not exist and set value to 1, otherwise increment existing bigram count by 1
# create a word cloud of bigrams
from wordcloud import WordCloud
wordcloud = WordCloud(
width=1000,
height=1000,
background_color='white',
collocations='FALSE',
min_font_size=16)
wordcloud.generate_from_frequencies(bigram_dict)
plt.figure(figsize = (7,7))
plt.imshow(wordcloud)
plt.axis("off")
plt.title(president)
plt.show()
nltk.download("vader_lexicon")
sia = SentimentIntensityAnalyzer()
dfs = []
# Loop through groups
for president, group in speech_data_pre.groupby("name"):
# Combine all speeches for the president into a single text
all_speeches = " ".join(token for token_list in group['tokens'] for token in token_list)
# Perform sentiment analysis
Sentiment = sia.polarity_scores(all_speeches)
# Get the political party for the president
party = group['political party'].iloc[0] # Assuming each president has one political party
# Create a DataFrame with the results
result_df = pd.DataFrame({'president name': [president], 'Sentiment score': [Sentiment["compound"]], 'political party': [party]})
# Append the DataFrame to the list
dfs.append(result_df)
# Concatenate all DataFrames in the list into one DataFrame
concat_speech = pd.concat(dfs, ignore_index=True)
# Display the resulting DataFrame
print(concat_speech)
[nltk_data] Downloading package vader_lexicon to [nltk_data] C:\Users\rmura\AppData\Roaming\nltk_data... [nltk_data] Package vader_lexicon is already up-to-date!
president name Sentiment score political party 0 Abraham Lincoln 1.0000 Republican 1 Andrew Jackson 1.0000 Democratic 2 Andrew Johnson 1.0000 Democratic (Union) 3 Barack Obama 1.0000 Democratic 4 Benjamin Harrison 1.0000 Republican 5 Bill Clinton 1.0000 Democratic 6 Calvin Coolidge 1.0000 Republican 7 Chester A. Arthur 1.0000 Republican 8 Donald Trump 1.0000 Republican 9 Dwight D. Eisenhower 1.0000 Republican 10 Franklin D. Roosevelt 1.0000 Democratic 11 Franklin Pierce 1.0000 Democratic 12 George W. Bush 1.0000 Republican 13 George Washington 1.0000 Federalist 14 Gerald Ford 1.0000 Republican 15 Grover Cleveland 1.0000 Democratic 16 Harry S. Truman 1.0000 Democratic 17 Herbert Hoover 1.0000 Republican 18 James A. Garfield 0.9998 Republican 19 James Buchanan 1.0000 Democratic 20 James K. Polk 1.0000 Democratic 21 James Madison 1.0000 Democratic-Republican 22 James Monroe 1.0000 Democratic-Republican 23 Jimmy Carter 1.0000 Democratic 24 Joe Biden 1.0000 Democratic 25 John Adams 1.0000 Federalist 26 John F. Kennedy 1.0000 Democratic 27 John Quincy Adams 1.0000 National Republican 28 John Tyler 1.0000 Whig 29 Lyndon B. Johnson 1.0000 Democratic 30 Martin Van Buren 1.0000 Democratic 31 Millard Fillmore 1.0000 Whig 32 Richard M. Nixon 1.0000 Republican 33 Ronald Reagan 1.0000 Republican 34 Rutherford B. Hayes 1.0000 Republican 35 Theodore Roosevelt 1.0000 Republican 36 Thomas Jefferson 1.0000 Democratic-Republican 37 Ulysses S. Grant 1.0000 Republican 38 Warren G. Harding 1.0000 Republican 39 William Harrison 1.0000 Whig 40 William McKinley 1.0000 Republican 41 William Taft 1.0000 Republican 42 Woodrow Wilson 1.0000 Democratic 43 Zachary Taylor 1.0000 Whig
party_colors = {"Democratic": "blue", "Republican": "red", "O": "gray"}
party_color = concat_speech['political party'].map(col_party).fillna("gray")
# Create the barplot
plt.figure(figsize=(12, 6))
plt.bar(concat_speech.index, concat_speech["Sentiment score"], color=party_color)
plt.xlabel("Speeches (Chronological Order)")
plt.ylabel("Sentiment Score")
plt.title("Sentiment of Presidential Speeches")
plt.xticks(concat_speech.index, concat_speech["president name"], rotation=45)
plt.tight_layout()
# Show the plot
plt.show()
concat_speech.groupby('political party')["Sentiment score"].mean()
political party Democratic 1.000000 Democratic (Union) 1.000000 Democratic-Republican 1.000000 Federalist 1.000000 National Republican 1.000000 Republican 0.999989 Whig 1.000000 Name: Sentiment score, dtype: float64
grouped_by_party = concat_speech.groupby('political party')
# Create a dictionary to store the top three presidents for each group
top_presidents_by_party = {}
# Find the top three presidents with the highest positive sentiment in each group
for party, group in grouped_by_party:
# Sort the group by 'Sentiment score' in descending order and take the top three
top_presidents = group.nlargest(3, 'Sentiment score')
# Store the top presidents in the dictionary
top_presidents_by_party[party] = top_presidents
# Print the top three presidents for each group
for party, top_presidents in top_presidents_by_party.items():
print(f"Top Three Presidents in Party {party}:")
print(top_presidents)
print()
Top Three Presidents in Party Democratic:
president name Sentiment score political party
1 Andrew Jackson 1.0 Democratic
3 Barack Obama 1.0 Democratic
5 Bill Clinton 1.0 Democratic
Top Three Presidents in Party Democratic (Union):
president name Sentiment score political party
2 Andrew Johnson 1.0 Democratic (Union)
Top Three Presidents in Party Democratic-Republican:
president name Sentiment score political party
21 James Madison 1.0 Democratic-Republican
22 James Monroe 1.0 Democratic-Republican
36 Thomas Jefferson 1.0 Democratic-Republican
Top Three Presidents in Party Federalist:
president name Sentiment score political party
13 George Washington 1.0 Federalist
25 John Adams 1.0 Federalist
Top Three Presidents in Party National Republican:
president name Sentiment score political party
27 John Quincy Adams 1.0 National Republican
Top Three Presidents in Party Republican:
president name Sentiment score political party
0 Abraham Lincoln 1.0 Republican
4 Benjamin Harrison 1.0 Republican
6 Calvin Coolidge 1.0 Republican
Top Three Presidents in Party Whig:
president name Sentiment score political party
28 John Tyler 1.0 Whig
31 Millard Fillmore 1.0 Whig
39 William Harrison 1.0 Whig
concat_speech.to_csv("./data/Sentimental_score_final.csv")